Skip to content

fix(csharp): add configurable timeout for CloudFetch body reads#374

Merged
msrathore-db merged 14 commits intomainfrom
fix/cloudfetch-cancellable-body-reads
Mar 30, 2026
Merged

fix(csharp): add configurable timeout for CloudFetch body reads#374
msrathore-db merged 14 commits intomainfrom
fix/cloudfetch-cancellable-body-reads

Conversation

@msrathore-db
Copy link
Copy Markdown
Collaborator

Summary

  • Replace bare ReadAsByteArrayAsync() with CopyToAsync + explicit CancellationToken timeout for CloudFetch body downloads
  • When HttpCompletionOption.ResponseHeadersRead is used, HttpClient.Timeout may not propagate to body reads on all runtimes (e.g., Mono), leaving body reads vulnerable to permanent hangs on dead TCP connections
  • New configurable connection property: adbc.databricks.cloudfetch.body_read_timeout_minutes (default: 15 minutes)

Test plan

  • Run parallel CloudFetch download test on Windows (net472) — verify stalls recover via timeout + retry
  • Run on Mono — verify downloads no longer hang permanently
  • Verify configurable timeout works via connection property override

This pull request was AI-assisted by Isaac.

Replace bare ReadAsByteArrayAsync() with CopyToAsync + explicit
CancellationToken timeout for CloudFetch body downloads.

When HttpCompletionOption.ResponseHeadersRead is used, HttpClient.Timeout
may not propagate to body reads on all runtimes (e.g., Mono). This leaves
body reads vulnerable to permanent hangs on dead TCP connections. Using
CopyToAsync with an explicit cancellation token ensures each 81920-byte
chunk read is cancellable.

New connection property: adbc.databricks.cloudfetch.body_read_timeout_minutes
Default: 15 minutes

Closes ES-1778880

Co-authored-by: Isaac
/// on each body download to prevent permanent hangs on dead TCP connections.
/// Default value is 15 minutes if not specified.
/// </summary>
public const string CloudFetchBodyReadTimeoutMinutes = "adbc.databricks.cloudfetch.body_read_timeout_minutes";
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just reuse the existing param?
public const string CloudFetchTimeoutMinutes = "adbc.databricks.cloudfetch.timeout_minutes";

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated. Thanks

// (e.g., Mono) when HttpCompletionOption.ResponseHeadersRead is used.
// Using CopyToAsync with an explicit token ensures dead TCP connections
// are detected on every 81920-byte chunk read.
using (var bodyTimeoutCts = CancellationTokenSource.CreateLinkedTokenSource(cancellationToken))
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just reuse the existing cancellationToken and extend that to 15mins?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That cancellation token is global and will cancel all the downloads. We want this to be a per file cancellation token for this use case

using (var contentStream = await response.Content.ReadAsStreamAsync().ConfigureAwait(false))
{
int capacity = contentLength.HasValue && contentLength.Value > 0
? (int)Math.Min(contentLength.Value, int.MaxValue)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, it's not actually possible to allocate an array of size int.MaxValue. I don't remember the failure mode and there's a constant for this -- but only in .NET 11+. Copilot claims this value is "~2,147,483,591". Can one of these streams be larger than that size?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No max chunk size is around 20MB so this won't exceed. However I'll change using int.Maxvalue to a smaller value

Copy link
Copy Markdown
Collaborator Author

@msrathore-db msrathore-db Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The .NET5+ implementation does not preallocate any size for this. Following the pattern now

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The .NET 5+ path (ReadAsByteArrayAsync) does pre-allocate internally — LoadIntoBufferAsync uses LimitArrayPoolWriteStream which sizes from Content-Length. It's just hidden inside the framework.

The net472 path with new MemoryStream() (no capacity) means the MemoryStream starts at 256 bytes and doubles repeatedly during CopyToAsync writes. For a 20MB CloudFetch file, that's ~17 internal resize+copy operations (256 → 512 → ... → 32MB), then ToArray() copies 20MB one more time from the 32MB internal buffer to a new 20MB array.

The fix for the int.MaxValue issue should be a reasonable cap, not removing pre-allocation entirely:

int capacity = contentLength.HasValue && contentLength.Value > 0
    ? (int)Math.Min(contentLength.Value, 100 * 1024 * 1024)
    : 0;
using (var memoryStream = new MemoryStream(capacity))

CloudFetch chunks are ~20MB max, so 100MB cap is safe. This avoids the int.MaxValue problem while keeping the same allocation efficiency as the original code.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. But since it was configurable by user so I didn't preallocate. Changed to 100 MB. It can still grow if the user configures it that way

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. Restored pre-allocation with Math.Min(contentLength, 100MB) cap. Thanks for the .NET runtime reference.

…ve capacity hint

- Reuse existing `adbc.databricks.cloudfetch.timeout_minutes` (default 5 min) for body
  read timeout instead of introducing a new parameter
- Remove MemoryStream capacity pre-allocation to match .NET 5+ behavior (start empty,
  grow dynamically) — eliminates the int.MaxValue issue
- Add `using` on MemoryStream for cleanliness

Co-authored-by: Isaac
// when HttpCompletionOption.ResponseHeadersRead is used.
// Using CopyToAsync with an explicit token ensures dead TCP connections
// are detected on every 81920-byte chunk read.
using (var bodyTimeoutCts = CancellationTokenSource.CreateLinkedTokenSource(cancellationToken))
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The body read timeout should reuse the same config.TimeoutMinutes that HttpClient.Timeout uses (from CloudFetchTimeoutMinutes). No need for a separate field — just use the same config parameter.

However, bump the default from 5 to 15 minutes. The 5-min HttpClient.Timeout only covers SendAsync (header phase with ResponseHeadersRead). ReadAsByteArrayAsync() is a separate call on HttpContent and is NOT covered by HttpClient.Timeout at all — which is the gap this PR correctly fills.

15 min is a safer default for body reads since large CloudFetch files can take longer to transfer, especially on slower networks.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Bumped DefaultCloudFetchTimeoutMinutes from 5 to 15. Single parameter controls both HttpClient.Timeout and body read CancelAfter. No separate field needed.

The 15-min default is important because we confirmed via testing on .NET Framework 4.7.2 that HttpClient.Timeout does NOT protect ReadAsByteArrayAsync body reads when ResponseHeadersRead is used — the body read is a separate call on HttpContent with no timeout coverage.

/// <summary>
/// Default timeout in minutes for individual CloudFetch body reads.
/// </summary>
public const int DefaultCloudFetchBodyReadTimeoutMinutes = 15;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just mean this DefaultCloudFetchBodyReadTimeoutMinutes is probably not needed, just use DefaultCloudFetchTimeoutMinutes

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done — removed DefaultCloudFetchBodyReadTimeoutMinutes entirely. Using DefaultCloudFetchTimeoutMinutes (now 15 min) for both.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the CloudFetch download path to enforce an explicit timeout during HTTP response body reads, addressing scenarios where HttpClient.Timeout may not reliably apply when using HttpCompletionOption.ResponseHeadersRead (notably on some runtimes).

Changes:

  • Plumbs CloudFetch timeout minutes from configuration into CloudFetchDownloader.
  • Replaces ReadAsByteArrayAsync() with a body-read implementation that uses a linked CancellationTokenSource with CancelAfter(...).
  • Uses a framework-conditional path: token-aware ReadAsByteArrayAsync(token) on NET5+; ReadAsStreamAsync + CopyToAsync(..., token) on older targets.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 99 to 102
_maxUrlRefreshAttempts = config.MaxUrlRefreshAttempts;
_urlExpirationBufferSeconds = config.UrlExpirationBufferSeconds;
_timeoutMinutes = config.TimeoutMinutes;
_memoryStreamManager = config.MemoryStreamManager;
Copy link

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR description mentions a new connection property adbc.databricks.cloudfetch.body_read_timeout_minutes (default 15), but this change wires the body-read timeout to the existing CloudFetchConfiguration.TimeoutMinutes (default 5) via _timeoutMinutes = config.TimeoutMinutes. Either the implementation should use the new dedicated property (and its intended default), or the PR description/commentary should be updated to reflect that the existing timeout_minutes setting controls body reads as well.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — removed the separate body_read_timeout_minutes parameter. Now using timeout_minutes (default bumped to 15 min) for both HttpClient.Timeout and body read CancelAfter.

Comment on lines +564 to +573
// Read the file data with an explicit timeout matching HttpClient.Timeout.
// ReadAsByteArrayAsync() on net472 has no CancellationToken overload,
// and HttpClient.Timeout may not propagate to body reads on all runtimes
// when HttpCompletionOption.ResponseHeadersRead is used.
// Using CopyToAsync with an explicit token ensures dead TCP connections
// are detected on every 81920-byte chunk read.
using (var bodyTimeoutCts = CancellationTokenSource.CreateLinkedTokenSource(cancellationToken))
{
bodyTimeoutCts.CancelAfter(TimeSpan.FromMinutes(_timeoutMinutes));
#if NET5_0_OR_GREATER
Copy link

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change introduces new timeout behavior for body reads (linked CTS + CancelAfter) but there doesn't appear to be an automated test covering the timeout path (e.g., a handler/content stream that stalls and verifies the download completes with a failure and the result task is completed rather than hanging). Adding a unit/E2E test for the body-read timeout would help prevent regressions, especially around cancellation vs. failure semantics and retries.

Copilot uses AI. Check for mistakes.
#endif
}
break; // Success, exit retry loop
}
Copy link

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new body-read timeout uses a linked CTS, which can cause ReadAsByteArrayAsync/CopyToAsync to throw OperationCanceledException when the timeout elapses. If that exception escapes the retry loop on the final attempt, DownloadFileAsync will complete in the Canceled state, but the download continuation only handles t.IsFaulted and will not call downloadResult.SetFailed(...). That leaves DownloadCompletedTask never completing (CloudFetchReader awaits it) and can also leak acquired memory. Suggest ensuring timeout-triggered cancellations are converted into a fault (e.g., catch OperationCanceledException when !cancellationToken.IsCancellationRequested and throw a timeout-specific exception), or handle t.IsCanceled in the continuation by completing the result with a failure/cancellation signal.

Suggested change
}
}
catch (OperationCanceledException oce) when (!cancellationToken.IsCancellationRequested)
{
// Treat a timeout of the body read (via the linked CTS) as a fault, not a cancellation.
throw new TimeoutException("Timed out while reading the response body.", oce);
}

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Fixed — added catch (OperationCanceledException) when (!cancellationToken.IsCancellationRequested) that rethrows as TimeoutException. This ensures the download continuation sees t.IsFaulted (not t.IsCanceled) and properly completes the DownloadCompletedTask.

Comment on lines +577 to +580
using (var memoryStream = new MemoryStream())
{
await contentStream.CopyToAsync(memoryStream, 81920, bodyTimeoutCts.Token).ConfigureAwait(false);
fileData = memoryStream.ToArray();
Copy link

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the net472 path, CopyToAsync into a default MemoryStream and then calling ToArray() adds at least one extra buffer copy and may cause repeated reallocations for large downloads. Since Content-Length is already read above, consider pre-sizing the stream when it fits in int (or using the configured RecyclableMemoryStreamManager when available) to reduce allocations/copies.

Suggested change
using (var memoryStream = new MemoryStream())
{
await contentStream.CopyToAsync(memoryStream, 81920, bodyTimeoutCts.Token).ConfigureAwait(false);
fileData = memoryStream.ToArray();
{
int capacity = (size > 0 && size <= int.MaxValue) ? (int)size : 0;
using (var memoryStream = capacity > 0 ? new MemoryStream(capacity) : new MemoryStream())
{
await contentStream.CopyToAsync(memoryStream, 81920, bodyTimeoutCts.Token).ConfigureAwait(false);
fileData = memoryStream.ToArray();
}

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Restored pre-allocation with a 100MB cap: Math.Min(contentLength, 100 * 1024 * 1024). This matches the .NET 5+ LimitArrayPoolWriteStream behavior (which also uses Content-Length for sizing) while avoiding the int.MaxValue allocation issue. CloudFetch chunks are ~20MB max so 100MB cap is safe.

…ancellation bug

- Bump DefaultCloudFetchTimeoutMinutes from 5 to 15. HttpClient.Timeout only covers
  SendAsync (headers) when ResponseHeadersRead is used; body reads are unprotected.
  15 min is safer for both phases, especially on slower networks.
- Fix OperationCanceledException from body timeout not being handled by download
  continuation. Convert body timeout cancellation into TimeoutException so it
  propagates as a fault (t.IsFaulted), not a cancellation (t.IsCanceled) which
  would leave DownloadCompletedTask never completing and hang the reader.
- Update comments to clarify why body reads need explicit timeout protection.

Co-authored-by: Isaac
Pre-allocate MemoryStream with Content-Length when available, matching
.NET 5+'s LimitArrayPoolWriteStream which also sizes from Content-Length.
Cap at 100MB (CloudFetch chunks are ~20MB max) to avoid int.MaxValue
allocation failures while keeping allocation efficiency.

Co-authored-by: Isaac
int capacity = contentLength.HasValue && contentLength.Value > 0
? (int)Math.Min(contentLength.Value, MaxPreAllocBytes)
: 0;
var memoryStream = new MemoryStream(capacity);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we wrap the memoryStream usage with a using?

Copy link
Copy Markdown
Collaborator

@eric-wang-1990 eric-wang-1990 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Get at least one stamp from @jadewang-db or @CurtHagenlocher, make sure to run sanity test with large download cloudfetch and make sure:

  1. correct data/number of rows returned
  2. performance should be comparable

Add large-scale benchmark for JDBC comparison: SELECT * FROM
main.tpcds_sf10_delta.store_sales (28.8M rows, 23 columns).

Co-authored-by: Isaac
@msrathore-db msrathore-db added the benchmark Run performance benchmarks on this PR label Mar 29, 2026
Keep OperationCanceledException catch for body read timeout conversion,
merge with main's new catch (Exception ex) when (!cancelled) pattern.

Co-authored-by: Isaac
@msrathore-db msrathore-db added benchmark Run performance benchmarks on this PR and removed benchmark Run performance benchmarks on this PR labels Mar 29, 2026
@github-actions
Copy link
Copy Markdown

📊 CloudFetch Benchmark Results

.NET 8.0
Query Mean (ms) Peak Memory (MB) Allocated Memory (MB) Gen2 Rows Cols
catalog_sales 2456.73 297.82 297.82 50 1441548 34
customer 849.86 37.11 37.11 2 100000 18
inventory 6870.46 242.10 242.10 74 11745000 5
sales_with_timestamps 3305.42 125.92 125.92 44 2880404 13
store_sales_numeric 2735.04 347.52 347.52 57 2880404 16
store_sales_sf10 20674.98 3474.72 3474.72 249 28800501 23
web_sales 1427.15 182.61 182.61 37 719384 34
wide_sales_analysis 9233.33 1159.39 1159.39 89 2880404 54
.NET Framework 4.7.2
Query Mean (ms) Peak Memory (MB) Allocated Memory (MB) Gen2 Rows Cols
catalog_sales 5726.15 532.08 532.08 9 1441548 34
customer 1823.25 38.85 38.85 1 100000 18
inventory 46115.50 264.15 264.15 6 11745000 5
sales_with_timestamps 13818.30 154.39 154.39 3 2880404 13
store_sales_numeric 13344.26 405.66 405.66 7 2880404 16
store_sales_sf10 109081.88 3820.68 3820.68 63 28800501 23
web_sales 4209.08 401.49 401.49 7 719384 34
wide_sales_analysis 16371.73 2923.29 2923.29 41 2880404 54

🟢 Improvement | 🔴 Regression | ⚪ No change

Format: current_value (baseline) diff%

  • Baseline: Latest successful run on main branch

Metrics:

  • Mean: Execution time in milliseconds
  • Peak Memory: Total bytes allocated during operation in MB
  • Allocated Memory: Bytes allocated per operation in MB
  • Gen2: Number of Gen2 garbage collections

"expected_rows": 28800501,
"columns": 23,
"category": "large-wide"
},
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you want to add this in the current PR? It looks like crossover from #376

// LimitArrayPoolWriteStream which also sizes from Content-Length internally.
// Cap at 100MB to avoid int.MaxValue allocation failures.
const int MaxPreAllocBytes = 100 * 1024 * 1024;
int capacity = contentLength.HasValue && contentLength.Value > 0
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How often will we not know the Content-Length in advance (i.e. use a chunked transfer)? For Direct Query in the service, I believe there's a commit limit set and I'm not sure what it's set to. I'll try to get an answer on that.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree on this, we should be able to get the contentLength from the http response header always

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I've looked at four different potential hosts and there's only one case where this value is set low enough for it to be a possible concern, and that's setting the value to 512 MB. Direct Query datasets are capped at one million rows and are generally not huge, but if there are five concurrent downloads of 20 MB chunks then we would blow through that with the preallocations.

One possible mitigation would be to use p/Invoke to check for the current commit limit and be more conservative if that limit is set conservatively.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. There wouldn't be a case without content length. Just added this as a safety net fallback. For the scope of this PR, I can either change it to 0 if content length isn't available which is never the case actually.
Does that sound good?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that seems like a better option for now. Does this code have access to tracing? If we always expect content-length then it might be worth tracing something when it's not present.

using (var bodyTimeoutCts = CancellationTokenSource.CreateLinkedTokenSource(cancellationToken))
{
bodyTimeoutCts.CancelAfter(TimeSpan.FromMinutes(_timeoutMinutes));
#if NET5_0_OR_GREATER
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's remove #if here. and use the else part of all runtime versions

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

// LimitArrayPoolWriteStream which also sizes from Content-Length internally.
// Cap at 100MB to avoid int.MaxValue allocation failures.
const int MaxPreAllocBytes = 100 * 1024 * 1024;
int capacity = contentLength.HasValue && contentLength.Value > 0
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree on this, we should be able to get the contentLength from the http response header always

: 0;
var memoryStream = new MemoryStream(capacity);
await contentStream.CopyToAsync(memoryStream, 81920, bodyTimeoutCts.Token).ConfigureAwait(false);
fileData = memoryStream.ToArray();
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, if the overall architecture could be changed to return this as a Memory<byte> or ArraySegment<byte> instead of a byte[], then the original MemoryStream's buffer could just be reused (via GetBuffer or TryGetBuffer) and an allocation and copy could be avoided.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. I'll add this as a todo to investigate in a broader fix.

…ark query

- Remove #if NET5_0_OR_GREATER — use CopyToAsync with CancellationToken for
  all runtimes, not just net472 (Jade's feedback)
- Add using on MemoryStream (Eric's feedback)
- Remove store_sales_sf10 benchmark query — belongs in PR #376 (Curt's feedback)

Co-authored-by: Isaac
Log operation_status_poller.poll_success with poll_count and
operation_state on each successful GetOperationStatus call.

Co-authored-by: Isaac
CloudFetchDownloader:
- Remove 100MB pre-allocation cap, use Content-Length directly
- Add cloudfetch.content_length_missing trace event when Content-Length absent

Poller telemetry fix:
- Activity.Current is null inside Task.Run (AsyncLocal doesn't propagate)
- Pass IActivityTracer from DatabricksCompositeReader to poller
- Wrap poll loop in TraceActivityAsync to create proper Activity span
- All poll_success, poll_error, max_failures_reached events now appear
  in adbcfile trace logs (verified)

Co-authored-by: Isaac
…r trace

Task.Delay throws OperationCanceledException on normal shutdown. Without
catching it inside PollOperationStatusCore, TraceActivityAsync logs it as
an error and sets ActivityStatusCode.Error — making normal shutdown look
like a failure in trace files. Catch and break instead.

Co-authored-by: Isaac
Move TraceActivityAsync dispatch to Start() and use single method with
Activity? parameter. No behavior change.

Co-authored-by: Isaac
@msrathore-db msrathore-db added this pull request to the merge queue Mar 30, 2026
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Mar 30, 2026
@msrathore-db msrathore-db added this pull request to the merge queue Mar 30, 2026
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Mar 30, 2026
…w retry

The body read timeout fires OperationCanceledException from the linked CTS.
The separate catch was converting it to TimeoutException and throwing it
outside the retry loop, preventing retries. The general catch (Exception ex)
already handles this correctly since cancellationToken.IsCancellationRequested
is false (only the linked body timeout CTS cancelled, not the parent token).

After all retries exhaust, InvalidOperationException is thrown (a fault),
so the download continuation handles it via t.IsFaulted correctly.

Fixes TimeoutRetryWithBackoff integration test.

Co-authored-by: Isaac
@msrathore-db msrathore-db added this pull request to the merge queue Mar 30, 2026
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Mar 30, 2026
@msrathore-db msrathore-db added this pull request to the merge queue Mar 30, 2026
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Mar 30, 2026
@msrathore-db msrathore-db added this pull request to the merge queue Mar 30, 2026
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Mar 30, 2026
@msrathore-db msrathore-db added this pull request to the merge queue Mar 30, 2026
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to no response for status checks Mar 30, 2026
@msrathore-db msrathore-db added this pull request to the merge queue Mar 30, 2026
Merged via the queue into main with commit 5e248be Mar 30, 2026
16 checks passed
@msrathore-db msrathore-db deleted the fix/cloudfetch-cancellable-body-reads branch March 30, 2026 23:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

benchmark Run performance benchmarks on this PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants